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S^ ■ Abstract 

A feasibility study of the measurement of the top-Higgs Yukawa coupling at a 
future linear e + e~ collider operating at -y/s = 800 GeV is presented. As compared 
to previous existing studies, much effort has been put in a "realistic simulation" by 
including irreducible+reducible backgrounds, realistic detector effects and reconstruc- 
tion procedures and finally a multivariate analysis. Both hadronic and semileptonic 
decay channels have been considered. 



1 Introduction 

The theory of electroweak interactions has been succesfully tested so far to an extremely 
high degree of accuracy. However, one of its key elements, the Higgs mechanism, remains 
to be tested experimentally. It is through the interaction with the ground state Higgs field 
that the fundamental particles acquire mass, which in turn sets the scale of the coupling 
with the Higgs boson. Once the Higgs boson is found (if ever), all its properties have to be 
accurately measured: mass, width, etc, and indeed its couplings to bosons and fermions. 
The couplings to the Z and W gauge bosons can be measured from the Higgstrahlung 
process: Z — ► ZH Q and the fusion processes: WW, ZZ —>■ H [||. On the other hand, 
the top quark provides a unique opportunity to measure the Higgs Yukawa coupling to 
fermions. Being proportional to the fermion mass, 

9ffH = ~f, (1.1) 

with v = (y/2 Gp)~ 1 ' 2 ~ 246 GeV, the top-Higgs Yukawa coupling is the largest among 
the different fermions: gf tH ~ 0.5 to be compared for instance with g% bH ~ 4 x 10~ 4 . 

The process e + e~ — > ttH provides a chance for a direct measurement of the top-Higgs 
Yukawa coupling || ]9| in the "Light Higgs Scenario" (Mh < 2m t ), which seems to be 
favored by the present precision electroweak data (Mh = 76^ 47 GeV as reported in ||). 
On the other hand, the current limit form direct search at LEP2 is Mh > 95.2 GeV at 
95% CL ||. The total cross-section depends sensitively on the top-Higgs Yukawa coupling, 
which can thus be inferred from the comparison of the measured total cross-section with 
the theoretical expectation as a function of gun- This measurement of gun is direct as 
compared to the indirect determination via its effect in the interquark potential near the 
it production threshold, which affects some threshold observables [J?]]. 

In this study we are going to assume that the MSM Higgs boson has already been 
discovered and its mass measured to be Mh = 120 GeV. For Mh = 120 GeV, the Higgs 
decays dominantly to bb (BR(H -> 66) ~ 77%), and assuming BR(t -> Wb) = 100%, this 
leads to multi-jet event topologies involving 4 b-jets in the final state. Therefore, one of 
the crucial experimental aspects will be flavor tagging. 

Previous studies on the feasibility of this measurement have already been performed ||] , 
but they were assuming a too simplistic simulation of detector effects (in particular b- 
tagging, which is critical here) and/or considering too few background processes. More 
complete studies have recently been presented H. 

2 Theoretical Scenario 

At lowest order there are 5 diagrams contributing to this process, as shown in Fig. ||. The 
dominant contribution comes from 7-exchange with the H being radiated off the t or the 
i. The diagram in which the H is radiated off the Z constitutes just a small correction, 
so that the total cross-section is to a good approximation oc gf tH (see Fig. ||a). 
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Figure 1: Tree level diagrams diagrams contributing to the process ttH. 



As it can be observed in Fig. [2]a, the total cross-section decreases at low \/s due to 
phase-space restrictions and at high \fs due to unitarity. Radiative effects in the initial 
state (initial state radiation and beamstrahlung) become increasingly important for high 
sfs and significantly distort the ttH lineshape, as shown in Fig. Qb. The main effect 
is to shift the maximum of the cross-section towards higher y/s, which for Mh = 120 
GeV is around yfs — 800 GeV. Initial state radiation turns out to be the dominant 
radiative process in order to decrease the sensitivity of the total cross-section on the 
Yukawa coupling. 

Recently, the NLO QCD corrections to the total cross-section have been computed [p~C|] . 
They turn out to be important at moderate energies, due to rescattering diagrams, which 
are generated by the Coulombic gluons exchange between the top quarks near the ti 
threshold. As a consequence, the total cross-section can be enhanced by a factor 2 with 
respect to LO whereas, since virtual and soft-gluon radiation are the dominant corrections, 
the Higgs and top quark energy and angular distributions are hardly changed. However, 
above threshold these corrections to the total cross-section are small (~ 5% at y/s = 1 
TeV) and negative, so for the purpose of this analysis they can be safely neglected. 

Since the Yukawa coupling is determined from the cross-section measurement, it is 
straightforward to estimate the expected statistical and some systematic uncertainties 
on g tt H for a given selection with efficiency e and purity p, applied on a data sample 
corresponding to an integrated luminosity L: 
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where (Ag U H / gttH) syst accounts for the uncertainties in the effective background cross- 
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Figure 2: Total cross-section for ttH at lowest order, for m t — 175 GeV and Mr — 120 GeV, as a 
function of the center-of-mass energy. In (a) the two contributions: H off the t (dashed) and H off the 
Z (dotted) are explicited. The effect of radiative processes in the initial state (initial state radiation and 
beamstrahlung) on the total cross-section is illustrated in (b). 



section (after selection), the integrated luminosity and the selection signal efficiency. 
Sstatidtm) an d Ssystidttn) are the so-called "sensitivity factors", defined as: 
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which are a function of y^ for a fixed Mjj- 

The sensitivity factors as a function of the center-of-mass energy are shown in Fig. ||a 
for Mh = 120 GeV. As it can be observed, S s t a t reaches a "plateau" for y^s > 700 
GeV, whereas S sys t is essentially independent of y^. At yfs = 800 GeV, the respective 
values are: S s t a t — 3.09 fb 1 ' 2 and S sys t ^ 1.92. Therefore, assuming an ideal selection 
(e = 100% and p = 100%), a statistical precision of around 1% could be achieved in g tt H 
for yfs > 700 GeV and L = 1000 fb -1 . In a more realistic situation of e = 5% and 
p = 50%, the statistical uncertainty would be ~ 6.5%, whereas the systematic uncertainty 
would be dominated by the uncertainty in the background normalization, if one assumed 
that both the signal selection efficiency and integrated luminosity can be known at the 1% 
level or better. 
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Figure 3: Sensitivity of the total cross-section on the top-Higgs Yukawa coupling as a function of the 
center-of-mass energy for Mh — 120 GeV and including radiative effects: (a) "sensitivity factors", (b) 
statistical uncertainty for an ideal selection assuming an integrated luminosity of 1000 fb -1 . 



3 Simulation Aspects 

In order to simulate the signal and the tiZ background, we have written a Monte Carlo 
event generator by using the squared matrix element as computed by the program MAD- 



GRAPH [11] and the HELAS [12] subroutines. The top and Higgs masses are assumed to 
be rat = 175 GeV and Mh = 120 GeV, respectively. The top width is computed includ- 
ing NLO QCD corrections. Tops are generated off-shell by including the corresponding 
Breit-Wigner distributions in the differential cross-section. The total cross-section and 
differential distributions are found to be in good agreement with the calculation in ||]. 
The rest of backgrounds have been generated with PYTHIA fl3|j . Interferences between 
signal and backgrounds have been neglected. The event samples have been generated at 
y/s = 800 GeV, including initial state radiation and beamstrahlung. Initial state radia- 
tion has been considered in the structure function approach and beamstrahlung has been 
generated with the aid of the CIRCE program [|l4f . Fragmentation, hadronization and 
particle decays are handled by JETSET [13], with parameters tunned to LEP2 data. 



3.1 Detector Simulation 

Once the events have been generated, they are processed through a fast simulation [|lj| of 
the response of a detector for the TESLA linear collider. The detector components, which 
are assumed to be: 



a vertex detector, 



• a tracker system with main tracker, forward tracker and forward muon tracker, 

• an electromagnetic calorimeter, 

• a hadronic calorimeter and 

• a luminosity detector, 

are implemented according to the TESLA Conceptual Design Report [H| . 

This fast detector simulation provides a flexible tool since its performance charac- 
teristics can be varied within a wide range. The calorimeter response is treated in a 
realistic way using a parametrization of the electromagnetic and hadronic shower deposits 
obtained from a full GEANT simulation (lTj and including a cluster finding algorithm. 
Pattern recognition is emulated by means of a complete cross-reference table between gen- 
erated particles and detector response. The output of the program consists in a list of 
reconstructed objects: electrons, gammas, muons, charged and neutral hadrons and unre- 
solved calorimeter clusters, as a result of an idealized Energy Flow algorithm incorporating 
track-cluster matching. 

3.2 B-tagging 

Jets coming from b and c-quark decays are tagged based on the non-zero lifetime of these 
quarks, using the Vertex Detector (VDET). In this study we have assumed the performance 
of a CCD VDET in a 1 cm radius beampipe. 

In order to look for this lifetime signal, we have chosen to use the 3D impact parameter 
(IP) of each charged track (distance of closest approach between the track and the b 
production point). Since the statistical resolution of the IP varies strongly from one track 
to another, we use the estimated statistical significance of the measured IP to define our 
tag. The b-tagging algorithm is kept simple so that the success of the analysis does not 
depend on detector details. More efficient algorithms can be developed by making use of 
multivariate techniques, such as Neural Networks. 

In Fig. ^, the IP significance distributions for different Z hadronic decays are compared. 
The lifetime signature can be clearly seen for Z — ► bb in the positive tail. 

We will use the IP significance distribution for non-lifetime tracks (those originated 
from Z — > uu, dd, ss) to define, for each track, a probability "to be consistent with being 
originated from the primary vertex" . This information can then be combined to get also 
a probability per jet or per event [pH ]. 

In order to test the performance of such b-tagging, we have estimated its efficiency and 
purity for a given cut in the jet probability. To do so, the Monte Carlo generated quarks 
are assigned to the reconstructed jets by a matching algorithm which associates those 
quark-jet pairs with minimum invariant mass, starting from the most energetic quark. 
Now, an efficiency e& and purity p\, can be defined as: 
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Figure 4: Track 3D impact parameter significance for Z hadronic events at -y/s^lOO GeV. 
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where nb iCor r is the number of b-jets correctly tagged, ribjag is the total number of tagged 
b-jets, and n& is the actual number of b-jets in the event. In the way the purity is defined 
here, it measures the missidentification probability of jets in a given process. 

The b-tagging efficiencies and purities shown in Fig. |as a function of the jet proba- 
bility cut, correspond to a sample of signal events where the W leptonic decays have been 
switched off and the Higgs has been forced to decay into bb (tiH — ► qqqqbbbb). It should 
be noted that, due to the high multiplicity of such events, the probability that the jet clus- 
tering algorithm assigns "lifetime tracks" (coming from the b-jets) to a light-quark (uds) 
jet is large. This leads to rather low values of the b-tagging efficiency such as ej, ~ 80% for 
a purity of pb ~ 60%. In order to quantify the performance degradation of the b-tagging 
as the event multiplicity increases, this algorithm has been tested with ZZ events, where 
one of the Z bosons has been forced to decay into b quarks and the other into light-quarks. 
The achieved b-tagging efficiency in this case is e& ~ 80% for a purity of pb ~ 80%. 

Given the low €b values for tiH events, the efficiency to tag correctly the four b-jets 
of a signal event by fixing a probability cut will be in general very small. In order not 
to reduce drastically the signal efficiency, we will not use the number of found b-jets for 
a certain lifetime probability as a selection cut. Instead, for every event, we will define 
as b-jets those four with the lowest probability (to originate from the primary vertex). 
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Figure 5: b-tagging efficiency and purity for iff/ 
probability cut. 



qqqqbbbb events as a function of the lifetime 



Applied to purely hadronic signal events, for example, this algorithm will tag the four 
correct b-jets in ~ 37% of the cases, and at least three of them in ~ 88% of the cases. 

4 Experimental Analysis 

The experimental analysis is performed assuming a total integrated luminosity of 1000 
fb _1 , which can be collected in around 3 years of running at C = 10 34 cm~ 2 s _1 . 

Both semileptonic and fully hadronic decay channels have been considered. In spite of 
the apparently clean signature of both channels (> 6 jets in the final state, out of which 
> 4 are b-jets, multi-jet invariant mass constraints, etc), the measurement has many 
difficulties, among which: 

• tiny signal (~ 2.6 fb) in front of backgrounds about 3 orders of magnitude larger: in 
Table |l|, the total cross-section for the signal and different backgrounds considered 
is listed together with the number of generated events. 

• limitations of jet-clustering algorithms in properly reconstructing multi-jets in the 
final state due to hard gluon radiation, jet mixing, etc, 

• degradation of b-tagging performance due to hard gluon radiation and jet mixing. 

Due to the extremely small signal-to-background (S/B) ratio, the philosophy of the 
analysis in both decay channels will be to start by applying a standard cuts preselection 
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Table 1: Total cross-section for signal and the different backgrounds considered at yfs = 800 GeV. Initial 
state radiation and beamstrahlung have been included. Also listed is the number of generated events for 
every process. 



in order to remove as much background as possible while keeping a high efficiency for the 
signal. Then, in order to further improve the statistical sensitivity, a multivariate analysis 
will be performed. At this stage our problem will be how to make an optimal use of the 
statistical information from a set of N distributions discriminating between signal and 
background. It can be proven [19] that it is possible to make an optimal projection from 
this input iV-dimensional space to a 1-dimensional space]]: 

a) without loss of sensitivity on the classes proportions and 

b) with a probabilistic interpretation (in terms of the a-posteriori Bayesian probability 

of being of signal type). 

This projection can be performed by using Neural Network (NN) techniques, which have 
become increasingly popular in High Energy Physics in the last few years. 

Even after selection, a S/B ratio much larger than 1 can only be obtained at the 
expense of a rather low signal efficiency. We have considered that the uncertainty on 
the background normalization after selection is going to be the dominant contribution 
to the systematic error. The main concern is how well parton shower can reproduce the 
tails in the distributions for those non- interfering background processes. In order to be 
just limited by that, it is important to have available event generators with 8 fermions 
in the final state, thus properly accounting for all interfering backgrounds. The main 
background process after selection is tt. Therefore, theoretical calculations up to 0{a 2 s ) 
would be needed by the time this measurement is performed. 

For a given systematic uncertainty in the background normalization, it is possible 
to adjust the selection signal efficiency in order to optimize the total uncertainty in the 



In the general case of m existing classes to be discriminated, the optimal projection is performed in a 
space (m — l)-dimensional. In our problem, all backgrounds are considered inclusively and m = 2, thus 
the optimal projection being 1-dimensional. 



Yukawa coupling. We have set as a goal a 5% systematic uncertainty in the background 
normalization in order not to dominate the measurement. Then, we have optimized the 
selection assuming this uncertainty. 

4.1 Semileptonic Channel 

The semileptonic decay channel, with a branching ratio of 43.9%, constitutes the golden 
channel in terms of high statistics and clean signature as compared to the fully hadronic 
decay channel, where 8 jets have to be reconstructed in the final state. The final state is: 

e + e" -> ttH -> qqb i^v^b bb, 

the experimental signature being then: 4 b-jets + 2 light-quark jets + £ + E m i ss . Hence, 
the high px and isolated lepton can be used for triggering purposes and might provide 
clear separation in the offline selection as well. The four-momentum imbalance due to the 
neutrino presence will also represent a discriminant variable as long as the final detector 
has a good hermeticity. Finally, the high content in b-jets will also be exploited as a 
powerful discriminant variable by using the vertex detector. 

As already stated, the philosophy of the selection has been to start with a series 
of preselection cuts addressed to remove as much background events as possible while 
keeping high efficiency for the signal. The preselection variables are compared for signal 
and background in Figs. || and ||, along with the cuts applied. The selected events are 
required to have a visible mass larger than 500 GeV but lower than 800 GeV, more 
than 60 energy flow (EF) objects reconstructed and at least 6 jets reconstructed with 



the JADE [20 1 jet-clustering algorithm for a resolution parameter y cu t = 10~ 3 . Then a 
series of cuts on topological variables such as the thrust and the normalized Fox Wolfram 
moments of the event are applied. These cuts are mainly addressed to drastically reduce 
the contamination of high cross-section backgrounds such as W + W~ or radiative qq, which 
tend to be much less spherical than signal events (due to the boost) and have a large value 
for these variables (see Fig. |9|). Also useful are the so-called high and low jet masses of 
the event, PmH and PmL, respectively. The event is divided in two hemispheres and 
particles are assigned to either hemisphere in order to minimize the quadratic sum of the 
two hemispheres invariant masses. For processes with two resonances (such as W + W~ 
or ZH), these distributions tend to show resonant structures around the true invariant 
masses. 

At this stage, and in order to reconstruct the ttH semileptonic decay signature, an 
energetic and isolated lepton has to be identified. This lepton candidate has been chosen 
as the charged track which maximizes Ei{\ — cosOij), where Ei is the track energy and 0£j 
the angle of such track with the closest of the 6 jets to which the remaining EF objects 
have been forced by using JADE. The efficiency of such algorithm to find the correct 
lepton (the one from the W decay) for ttH semileptonic events has been determined to be 



10 



Once the event has been clustered into 6 jets plus 1 lepton and after rejecting those 
events having jets with less than 3 EF objects, a probability to contain no-lifetime is 
assigned to each jet as described in Sect. |3^. This allows us to build a powerful variable 
which reflects the lifetime content of the event by adding up the probabilities of the four 
most b-like jets (those with lower non- lifetime probability). 

In Table pi the preselection efficiencies for the signal and the different backgrounds are 
displayed. The situation is such that the overall effective cross-section for the background 
is 17.60 fb, while for the signal is only 0.61 fb. This translates into such a poor sample 
purity (p ~ 3.3%), that any uncertainty in the background normalization completely erases 
any significance in the signal. 
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Table 2: Semileptonic channel preselection efficiencies and effective cross-sections. 



However, there are several variables which, after the preselection, still have some dis- 
criminating power. We will use a NN in order to project in an optimal way this N- 
dimensional information into a 1-dimensional variable. 

Nine variables showing high discrimination at the preselection level (see Fig. |10| ) have 
been chosen to train a NN to separate the signal from the background. Three of these 
variables use the information of the leptonic decay of one of the W bosons (lepton energy, 
angle between lepton and closest jet and invariant mass between the lepton and the missing 
momentum), two use the b-jets content of the event (global no-lifetime probability for 
the event and sum of the no- lifetime probabilities of the four most b-like jets) and four 
more topological variables (thrust, aplanarity, number of jets clusterized with JADE for 
Vcut = 10~ 3 and the total visible mass). 

After training the NN, the weights distribution for the different neurons allows to 
determine the discriminant power of each of the 9 input variables (see Table |3|). It can be 
seen that the most discriminating variables are those containing b-tagging information, 
followed by the identified lepton related variables. 

The NN output distributions for the signal and the different backgrounds are compared 



in Fig. 11, The histograms have been normalized to the same integrated luminosity (1000 
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Table 3: Discriminant power of each of the 9 input variables of the semileptonic selection Neural Network. 



fb _1 ) to show how, even after having used the information in the 9 variables, the S/B 
ratio is smaller than one in all the bins. 

Applying a cut in the NN output variable allows us to perform an optimal test of 
decision. A measurement of the tiH cross-section, and hence of the top Yukawa coupling, 
can then be done assuming the knowledge of the expected number of background events. 
However, is necessary to take into account the systematic uncertainty in the background 
normalization which, as already mentioned in Sect. |I|, has been assumed here to be at the 
5% level. 

In Fig. H the evolution of both the statistical and systematic uncertainties (as defined 
above) is plotted as the cut in the NN output is varied. Actually, the horizontal scale is 
not directly the NN output, but the a-posteriori Bayesian probability of being of signal- 
type for the expected proportions of signal and background, which can be computed as 
a function of the NN output. It can be seen how, by cutting harder on the NN output, 
the systematic error coming from the background normalization can be kept under control 
since a higher sample purity is achieved. The optimal cut is found within the plateau in 
the total error for the minimum possible systematic uncertainty 

In Table [|, the statistical and systematic uncertainties are compared in different steps, 
illustrating the usefulness of the multivariate analysis. As already mentioned, the sample 
purity after preselection is so poor that, not only the statistical error degrades because 



of statistical fluctuations in the background, but the systematic error (see Eq. ^2) in- 
troduced as a result of 5% uncertainty in the background normalization leads effectively 
to the absence of a measurement. The situation is dramatically improved when using a 
multivariate analysis and the top-Higgs Yukawa coupling is determined from a fit to the 
output neuron distribution. The systematic uncertainty can be further controlled (at the 
expense of slightly increasing the statistical uncertainty) by choosing a suitable cut in the 
NN output distribution. 
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Figure 6: Evolution of the error in the top Yukawa coupling as a function of the NN output cut 
(semileptonic channel). 
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Table 4: Statistical and systematic uncertainties in the top-Higgs Yukawa coupling from the semileptonic 
channel. An integrated luminosity of 1000 fb _1 has been assumed. 



4.2 Hadronic Channel 

The fully hadronic decay channel, with a branching ratio of 45.6%, benefits from the high 
statistics but is the most difficult one in terms of signal to background discrimination, 
experimental reconstruction and the one potentially more affected by systematic uncer- 
tainties. The final state is: 



e e 



ttH — ► qqb qqb bb, 



the experimental signature being quite challenging: 8 jets in the final state, out of which 
4 are b-jets. This imposes quite stringent requirements in the vertex detector. Due to 
limitations in the jet clustering algorithms, the ability to properly reconstruct 8 jets is not 
dominated by the detector performance. In this sense, this channel does not impose strict 
requirements for what the tracker momentum resolution and calorimeter energy resolution 
and granularity is concerned. 
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Potential backgrounds are genuine multi-jet processes like tt and ttZ. In fact, the 
large cross-section of tt, together with hard gluon emission easily emulating 8 jets make of 
this process the most important background after selection. Of particular concern is tig*, 
with the gluon splitting into a bb pair, since it easily emulates 4 b-jets in the final state, 
even though the invariant mass of the bb pair from the gluon splitting peaks at low values. 
Since the assumed Higgs mass is relatively close to the Z mass, ttZ constitutes an almost 
irreducible background. The main reason is that the invariant mass resolution of the bb 
system becomes seriously degraded due to particle mixing between jets in such a populated 
environment and to energy losses in neutrino emission from the B-meson cascade decays. 
On the other hand and again due to the large cross-section and limitations in the b-tagging 
in such a busy environment, hadronic final states such as W + W~ — > qiq2Q3Q49*, where the 
gluon splits into a bb pair will lead to the necessity of a stronger selection and therefore 
reduced statistical sensitivity on the Yukawa coupling. 

Like for the semileptonic channel, a standard cuts preselection is applied in order 
to remove as much background as possible before the multivariate analysis. The selected 
events are required to have a visible mass in excess of 70%y/s (that is 560 GeV), more than 
120 EF objects reconstructed and at least 7 jets reconstructed with the JADE algorithm 
for a resolution parameter y cut = 10 -3 . Then the event is forced to have exactly 8 jets 
reconstructed by using JADE. Further preselection cuts require a minimum of 2 EF objects 
per jet, a minimum di-jet invariant mass of 20 GeV and the thrust larger to 0.85. The 
preselection variables are compared for signal and background in Fig. |l^, along with 
the cuts applied. The preselection efficiencies and effective cross-section for the different 
processes considered are listed in Table ||. 



Process 




e(%) 


(Jeff (fb) 


ttH -► 8q 




77.06 


0.90 


ttH -► 6q£i> 


+ Aqlvtv 


9.63 


0.14 


qq (5 flav.) 




0.378 


4.38 


tt 




5.02 


15.22 


ttZ 




27.35 


1.25 


w + w~ 




0.185 


8.14 


zz 




0.139 


0.43 


Total Bckg 






29.55 



Table 5: Hadronic channel preselection efficiencies and effective cross-sections. 

After preselection, the efficiency for signal is reduced to 77% and the sample purity is 
only -3.0%. 

Note the high remaining cross-section for qq and W + W~ despite the cuts applied, 
e.g. in minimum di-jet invariant mass. The main responsible is hard gluon radiation. In 
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Table [| the average number of gluons with P g > 20 GeV/c, average gluon momentum and 
angle of the gluon with respect to the parent quark are listed for both processes, before and 
after preselection. Indeed, multi-gluon radiation (as simulated by parton shower) leads to 
genuine 8-jet final state even for processes like qq (5 nav.)/VF + VF~ with only 2/4 initial 
partons. Owing to the large cross-section of these processes, they constitute non-negligible 
backgrounds which have to be taken into account in any "realistic simulation" . 



Process 



<N g > 



< P g > (GeV/c) < 9 m > (deg.) 



qq (5 flav.) 
W + W~ 



4.2(6.9) 
3.8(4.0) 



64.6(50.0) 
61.6(45.4) 



16.0(44.4) 
6.3(22.7) 



Table 6: Hard gluon (P g > 20 GeV/c) radiation as predicted by parton shower at y/s -■ 
and after (between parenthesis) preselection. 



800 GeV before 



As it can be observed in Fig. 12, the preselection variables after cuts still have discrim- 



inating power between signal and background. In order to optimally use these variables, 
they are further used together with two more variables to train a Preselection Neural 



Network. The two variables added (shown as the two last variables in Fig. 12) provide 
information about the lifetime content of the event: the logarithm of the event probability 
to contain no-lifetime and the difference between the probability of the fourth jet and 
the first jet (sorted from the most b-like to the least b-like). In Fig. ^ it is shown the 
Preselection Neural Network output, after training, for both signal and background. No 
cut is applied in this distribution, but it is rather used as a discriminant variable. 




7 0.8 0.9 
NNPresel output 

Figure 7: Hadronic preselection Neural Network output. Signal (solid) and background (dashed) have 
been normalized to the same number of events. 
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There are 10 more variables which are discriminating between signal and background 
(see Fig. [0|). Most of them are variables about the global event topology: 

• Njets(LUCLUS, dcut=6.5 GeV): number of jets found with the LUCLUS @ J et " 
clustering algorithm for a distance measure of 6.5 GeV; 

• PmH and PmL: high and low jet masses of the event, already described; 

• Max(Ejet)-Min(Ejet): difference between maximum and minimum jet energy; 

• Evis: total visible energy of the event; 

• Thrust, Oblateness, Aplanarity, 

others contain information about flavor tagging (SumPbtagOrd: sum of the probability 
to contain no-lifetime for the four most b-like jets) or Higgs mass reconstruction: 

• Reco Higgs mass: reconstructed Higgs mass. The 4 most b-like jets are assumed 
to be the b-jets, which reduces the number of possible jet assignments to 36. The 
combination which maximizes: 

V{m hi2 i 3 - m t ) x V(m i4 i 5 i a - m t ) x V(m ili2 - m w ) X V{m iiib - m w ) x V(m i7is - m H ), 

is selected. In the above expresion V are the probability density functions for the correct 
jet pairing and m^ (m^-fc) is the invariant mass between jets i and j (i, j and k). 

These variables, together with the Preselection Neural Network output distribution 
are used to train a Selection Neural Network. After training, it is found that the most 
discriminating variables are the Preselection Neural Network output, reconstructed Higgs 
mass, thrust and aplanarity (see Table 0). The output neuron distribution is compared 



for signal and background in Fig. 14. 



As for the semileptonic decay channel, in Table ffl the statistical and systematic un- 
certainty (from 5% background normalization uncertainty only) on the top-Higgs Yukawa 
coupling are compared in different steps. 
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Variable 




Discriminant Power (%) 


-t^vis 




5.9 


max(EJ et ) — mm 


(Ejct) 


7.6 


N jets (LUCLUS) 




6.9 


Thrust 




13.3 


Aplanarity 




11.5 


Oblateness 




5.4 


High jet mass 




7.3 


Low jet mass 




8.2 


V* pjet 1 
Z-ri=l,4 r btag 




6.5 


M|f° 




11.7 


^.presel 




15.7 



Table 7: Discriminant power of each of the 11 input variables of the hadronic selection Neural Network. 



e (%) S/B Stat, error (%) Syst. error (%) Total error (%) 

After Preselection 77T O03 9^8 83^5 83^5 

Fit to NNSel (50 bins) " " 4.2 13.7 14.3 

NNSel>0.95 (optimal cut) 8.5 0.90 7.3 3.0 7.9 

Table 8: Statistical and systematic uncertainties in the top-Higgs Yukawa coupling from the fully hadronic 
channel. An integrated luminosity of 1000 fb _1 has been assumed. 
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5 Conclusions 

The reaction e + e~ — ► tti? allows a direct determination of the top-Higgs Yukawa coupling 
through its total cross-section measurement. We have studied the feasibility of this mea- 
surement for Mu = 120 GeV in an future e + e~ linear collider operating at \/s = 800 GeV 
and assuming 1000 fb _1 of integrated luminosity. 

The analysis has been performed for both hadronic and semileptonic decay channels, 
which constitute almost 90% of all decays. For both of them, several sources of back- 
ground have been considered, including not only the interfering ones like ttZ (although 
interferences haven been neglected) but also those non- interfering like qq or W + W~ but 
which have huge cross-sections as compared to that of the signal. 

In both cases, a set of variables with high discriminating power has been chosen to 
perform a multivariate analysis in order to use their N-dimensional information in a way 
as optimal as possible. For the two studied channels, topological variables have been used 
as well as some others containing b-tagging information. 

Our final results show the statistical uncertainties that can be achieved in each chan- 
nel for an integrated luminosity of 1000 fb _1 . We have estimated as well the systematic 
uncertainty that would be associated to a 5% uncertainty in the overall background nor- 
malization (even though the main remaining background is tt) as a way to quantify the 
importance of an improvement in such theoretical uncertainty by the time the measure- 
ment might be performed. 

As a final result we can quote the combination of the two channels considering the 
systematic uncertainty fully correlated between them, which leads to a total uncertainty 
in the top-Higgs Yukawa coupling of ~ 5.5%. The statistical uncertainty only would be 
~ 4.2%. 
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Figure 8: Preselection variables for the semileptonic channel (I). Signal (solid) and background (dashed) 
have been normalized to the same number of events. The background histograms have been built by adding 
all the different backgrounds contributions weighted according to their relative cross-sections. 
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Figure 9: Preselection variables for the semileptonic channel (II). Signal (solid) and background (dashed) 
have been normalized to the same number of events. The background histograms have been built by adding 
only the W + W~ and qq contributions weighted according to their relative cross- sect ions. In this way the 
differences between the signal and the most topologically different backgrounds are clearly visualized. 
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Figure 10: Selection Neural Network variables for the semileptonic channel. Signal (solid) and back- 
ground (dashed) have been normalized to the same number of events. The background histograms have 
been built by adding all the different backgrounds contributions weighted according to their relative cross- 
sections. 
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Figure 11: Semileptonic Selection Neural Network output. The different contributions are normalized 
to the same luminosity (1000 fb _1 ). 
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Figure 12: Preselection variables for the hadronic decay channel. Signal (solid) and background (dashed) 
have been normalized to the same number of events. The background prediction has been computed by 
adding all the different background contributions weighted according to their relative cross-section. 
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Figure 13: Selection Neural Network variables for the hadronic decay channel. Signal (solid) and 
background (dashed) have been normalized to the same number of events. The background prediction has 
been computed by adding all the different background contributions weighted according to their relative 
cross- section. 
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Figure 14: Hadronic selection Neural Network output. Top: Signal (solid) and background (dashed) 
have been normalized to the same number of events. Bottom: comparison of signal (shaded) and back- 
ground (dashed) for L = 1000 fb _1 . 
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